Approximation Algorithms for k-Anonymity
نویسندگان
چکیده
We consider the problem of releasing a table containing personal records, while ensuring individual privacy and maintaining data integrity to the extent possible. One of the techniques proposed in the literature is k-anonymization. A release is considered k-anonymous if the information corresponding to any individual in the release cannot be distinguished from that of at least k − 1 other individuals whose information also appears in the release. In order to achieve k-anonymization, some of the entries of the table are either suppressed or generalized (e.g. an Age value of 23 could be changed to the Age range 20-25). The goal is to lose as little information as possible while ensuring that the release is k-anonymous. This optimization problem is referred to as the k-Anonymity problem. We show that the k-Anonymity problem is NP-hard even when the attribute values are ternary and we are allowed only to suppress entries. On the positive side, we provide an O(k)-approximation algorithm for the problem. We also give improved positive results for the interesting cases with specific values of k — in particular, we give a 1.5-approximation algorithm for the special case of 2-Anonymity, and a 2-approximation algorithm for 3-Anonymity. ∗. A preliminary version of this paper appeared in the Proceedings of the 10th International Conference on Database Theory (ICDT’05) (Aggarwal, Fèder, Kenthapadi, Motwani, Panigrahy, Thomas, and Zhu, 2005). Manuscript received 6 Aug 2005, accepted 8 Nov 2005, published 20 Nov 2005 k-ANONYMITY AGGARWAL, FEDER, KENTHAPADI, MOTWANI, PANIGRAHY, THOMAS, ZHU
منابع مشابه
Theory of Privacy and Anonymity
3 k-Anonymity with hierarchy-based generalization 12 3.1 Problem complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Algorithms for k-anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Samarati’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 Incognito . . . . . ...
متن کاملEnforcement of k-anonymity Through Generalization and Suppression
While limited data set is shown to not guarantee anonymity, k-anonymity is proposed by Dr. Latanya Sweeney of MIT as an alternative way to release public information while ensuring both data privacy and data integrity [1, 2, 3]. k-anonymity is provided by using generalization and suppression techniques. Generalization involves replacing a value with a less specific but semantically consistent v...
متن کاملAdaptive Anonymity via b-Matching
The adaptive anonymity problem is formalized where each individual shares their data along with an integer value to indicate their personal level of desired privacy. This problem leads to a generalization of k-anonymity to the b-matching setting. Novel algorithms and theory are provided to implement this type of anonymity. The relaxation achieves better utility, admits theoretical privacy guara...
متن کاملStreaming Algorithms for k-Center Clustering with Outliers and with Anonymity
Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-cent...
متن کاملUtility - Preserving k - Anonymity
As technology advances and more and more person-specific data like health information becomes publicly available, much attention is being given to confidentiality and privacy protection. On one hand, increased availability of information can lead to advantageous knowledge discovery; on the other hand, this information belongs to individuals and their identities must not be disclosed without con...
متن کامل